{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lesson 3 Demo 4: Using the WHERE Clause\n",
    "<img src=\"images/cassandralogo.png\" width=\"250\" height=\"250\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In this exercise we are going to walk through the basics of using the WHERE clause in Apache Cassandra.\n",
    "\n",
    "##### denotes where the code needs to be completed.\n",
    "\n",
    "Note: __Do not__ click the blue Preview button in the lower task bar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We will use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: \n",
    "! pip install cassandra-driver\n",
    "#### More documentation can be found here:  https://datastax.github.io/python-driver/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import Apache Cassandra python package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cassandra"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### First let's create a connection to the database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "try: \n",
    "    cluster = Cluster(['127.0.0.1']) #If you have a locally installed Apache Cassandra instance\n",
    "    session = cluster.connect()\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's create a keyspace to do our work in "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    session.execute(\"\"\"\n",
    "    CREATE KEYSPACE IF NOT EXISTS udacity \n",
    "    WITH REPLICATION = \n",
    "    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }\"\"\"\n",
    ")\n",
    "\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Connect to our Keyspace. Compare this to how we had to create a new session in PostgreSQL.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    session.set_keyspace('udacity')\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's imagine we would like to start creating a new Music Library of albums. \n",
    "### We want to ask 4 question of our data\n",
    "#### 1. Give me every album in my music library that was released in a 1965 year\n",
    "#### 2. Give me the album that is in my music library that was released in 1965 by \"The Beatles\"\n",
    "#### 3. Give me all the albums released in a given year that was made in London \n",
    "#### 4. Give me the city that the album \"Rubber Soul\" was recorded"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Here is our Collection of Data\n",
    "<img src=\"images/table3.png\" width=\"650\" height=\"350\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How should we model this data? What should be our Primary Key and Partition Key? Since our data is looking for the YEAR let's start with that. From there we will add clustering columns on Artist Name and Album Name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"CREATE TABLE IF NOT EXISTS music_library \"\n",
    "query = query + \"(year int, artist_name text, album_name text, city text, PRIMARY KEY (year, artist_name, album_name))\"\n",
    "try:\n",
    "    session.execute(query)\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's insert our data into of table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"INSERT INTO music_library (year, artist_name, album_name, city)\"\n",
    "query = query + \" VALUES (%s, %s, %s, %s)\"\n",
    "\n",
    "try:\n",
    "    session.execute(query, (1970, \"The Beatles\", \"Let it Be\", \"Liverpool\"))\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "try:\n",
    "    session.execute(query, (1965, \"The Beatles\", \"Rubber Soul\", \"Oxford\"))\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "try:\n",
    "    session.execute(query, (1965, \"The Who\", \"My Generation\", \"London\"))\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "\n",
    "try:\n",
    "    session.execute(query, (1966, \"The Monkees\", \"The Monkees\", \"Los Angeles\"))\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "\n",
    "try:\n",
    "    session.execute(query, (1970, \"The Carpenters\", \"Close To You\", \"San Diego\"))\n",
    "except Exception as e:\n",
    "    print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's Validate our Data Model with our 4 queries.\n",
    "\n",
    "Query 1: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1965 The Beatles Rubber Soul Oxford\n",
      "1965 The Who My Generation London\n"
     ]
    }
   ],
   "source": [
    "query = \"SELECT * FROM music_library WHERE YEAR=1965\"\n",
    "try:\n",
    "    rows = session.execute(query)\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "for row in rows:\n",
    "    print (row.year, row.artist_name, row.album_name, row.city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Let's try the 2nd query.\n",
    " Query 2: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1965 The Beatles Rubber Soul Oxford\n"
     ]
    }
   ],
   "source": [
    "query = \"SELECT * FROM music_library WHERE YEAR = 1965 AND ARTIST_NAME = 'The Beatles'\"\n",
    "try:\n",
    "    rows = session.execute(query)\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "for row in rows:\n",
    "    print (row.year, row.artist_name, row.album_name, row.city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's try the 3rd query.\n",
    "Query 3: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error from server: code=2200 [Invalid query] message=\"Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING\"\n"
     ]
    }
   ],
   "source": [
    "query = \"SELECT * FROM music_library WHERE YEAR = 1965 AND ARTIST_NAME = 'The Beatles' AND CITY = 'London'\"\n",
    "try:\n",
    "    rows = session.execute(query)\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "for row in rows:\n",
    "    print (row.year, row.artist_name, row.album_name, row.city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Did you get an error? You can not try to access a column or a clustering column if you have not used the other defined clustering column. Let's see if we can try it a different way. \n",
    "Try Query 4: \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Oxford\n"
     ]
    }
   ],
   "source": [
    "query = \"SELECT city FROM music_library WHERE YEAR = 1965 AND ARTIST_NAME = 'The Beatles' AND ALBUM_NAME = 'Rubber Soul'\"\n",
    "try:\n",
    "    rows = session.execute(query)\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "    \n",
    "for row in rows:\n",
    "    print (row.city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### And Finally close the session and cluster connection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session.shutdown()\n",
    "cluster.shutdown()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
